feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a) by bootjp · Pull Request #721 · bootjp/elastickv

bootjp · 2026-04-30T11:07:08Z

Summary

Phase 3.D PR 4-B-3a — adds the stateless htfifo capability poller that PR 5's CreateQueue gate consumes. Stacks on the now-merged #715 (PR 4-B-2, partition resolver). This PR is purely additive: new helper file, new test file, no existing code touched. Next is PR 4-B-3b (leadership-refusal hook + flag flip).

What's added

adapter/sqs_capability_poller.go:
- HTFIFOCapabilityReport{AllAdvertise, Peers} — binary go/no-go signal + per-peer detail for operator triage.
- HTFIFOCapabilityPeerStatus{Address, HasHTFIFO, Capabilities, Error} — one peer's polling result.
- PollSQSHTFIFOCapability(ctx, client, peers) — concurrent goroutine-per-peer poll, indexed-channel result aggregation (race-free).
- Per-peer timeout defaultSQSCapabilityPollTimeout = 3s so a single hung peer can't stall the cluster-wide poll.
- Body capped at 1 KiB via io.LimitReader so a misconfigured peer can't drain memory.
- Bare host:port and full http://… / https://… URLs both accepted.
- Fail-closed on every failure mode: timeout, transport error, non-200, malformed JSON, missing capability. Empty peer list → vacuously AllAdvertise=true (caller validates list completeness).

What's NOT added (deferred)

htfifoCapabilityAdvertised stays false. PR 4-B-3b adds the §8 leadership-refusal hook + per-acquisition observer in kv/raftengine/etcd and flips the flag.
CreateQueue does NOT yet call this helper. PR 5 lifts the PartitionCount > 1 dormancy gate AND wires the capability check in the same commit (per the §11 rollout plan's "gate-and-lift atomically" rule).

Test plan

9 top-level tests covering the contract surface:

Self-review (per CLAUDE.md)

Data loss — read-only HTTP poll; no FSM/Pebble/retention path. No issue.
Concurrency / distributed failures — peer polls run in independent goroutines; results land via an indexed channel so slice writes are obviously race-free. Per-peer timeout enforced via context.WithTimeout so a slow peer can't stall the rest. Body capped via io.LimitReader. No issue.
Performance — N peers polled concurrently, not serially; the test pins this. Per-peer cost is one HTTP round-trip + a JSON parse of a tiny body. No hot-path impact (CreateQueue is a control-plane operation, not request hot path). No issue.
Data consistency — fail-closed on every failure mode preserves the §8.5 "any peer that doesn't respond is treated as not-yet-upgraded" rule. The vacuously-true empty-peer-list case is documented and the caller's responsibility. No issue.
Test coverage — every documented failure path (HTTP error, transport error, JSON parse, timeout, missing capability, empty peer, body-size cap) is pinned. Concurrent polling is pinned (would have caught a regression to serial). URL construction edges pinned.

Adds a stateless helper that polls each peer's /sqs_health endpoint and reports whether all advertise the htfifo capability. This is the building block PR 5's CreateQueue gate uses to refuse PartitionCount > 1 until every node in the cluster has the HT-FIFO data plane. What changes adapter/sqs_capability_poller.go (new file): - HTFIFOCapabilityReport: AllAdvertise + per-peer detail. - HTFIFOCapabilityPeerStatus: address, HasHTFIFO flag, raw capabilities slice, Error string for failure detail. - PollSQSHTFIFOCapability(ctx, client, peers): polls each peer concurrently. Returns AllAdvertise=false on any timeout, HTTP error, malformed JSON, or missing-capability — fail-closed per §8.5. - Vacuously AllAdvertise=true on empty peer list (caller is responsible for ensuring the peer list is meaningful). - Per-peer timeout capped by defaultSQSCapabilityPollTimeout (3s) so a single hung peer cannot stall the whole poll. - Body capped at 1 KiB via io.LimitReader so a misconfigured peer cannot drain memory. - Bare host:port and full http://… / https://… URLs both accepted — operators can front the endpoint with TLS or a proxy without the helper having to know. What does NOT change yet - htfifoCapabilityAdvertised stays false. PR 4-B-3b adds the §8 leadership-refusal hook (startup + per-acquisition observer) and flips this flag. - CreateQueue does NOT yet call PollSQSHTFIFOCapability — PR 5 lifts the PartitionCount > 1 dormancy gate AND wires the capability check in the same commit. Tests adapter/sqs_capability_poller_test.go: 9 top-level tests across the contract surface. - AllAdvertise happy path with multiple peers. - One-bad-apple: a peer with empty capabilities drops AllAdvertise. - Transport failures (HTTP 500, connection refused, malformed JSON) all fail closed with non-empty Error. - Hung peer respects per-peer timeout — test bound is well below what a serial poll would take. - Empty peer list → AllAdvertise vacuously true. - Empty peer address → fail closed with explicit Error. - Full-URL peer (http:// or https://) accepted alongside bare host:port. - Concurrent polling: 5 peers each delaying 200ms must finish in well under 1 second (serial would take 1s+). - Body-size limit: a 10 KiB response truncated mid-string is surfaced as a JSON parse error, not a half-decoded value. - TestBuildSQSHealthURL covers the URL construction edge cases.

coderabbitai · 2026-04-30T11:07:15Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 11 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ddf5726-2893-459d-b166-d854a09d0db0

📥 Commits

Reviewing files that changed from the base of the PR and between bce448f and 2d9e06d.

📒 Files selected for processing (2)

adapter/sqs_capability_poller.go
adapter/sqs_capability_poller_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sqs-htfifo-capability-poller

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 4 minutes and 11 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-04-30T11:07:16Z

@claude review

Phase 3.D PR 4-B-3a — htfifo capability poller. Pure-additive helper that PR 5's CreateQueue gate will consume; htfifoCapabilityAdvertised stays false (PR 4-B-3b will flip it after the leadership-refusal hook lands). PR description has the full self-review and test plan.

claude · 2026-04-30T11:07:30Z

Claude finished @bootjp's task in 2m 18s —— View job

Code Review: feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a)

Tasks:

Read changed files
Review implementation (adapter/sqs_capability_poller.go)
Review tests (adapter/sqs_capability_poller_test.go)
Check consistency with existing SQS adapter patterns
Post final review

Verdict: Approve with minor notes. The implementation is correct. The concurrency model is clean, the fail-closed contract is consistent, and the test coverage is thorough. A few small issues worth calling out below, none blocking.

Implementation (`adapter/sqs_capability_poller.go`)

Goroutine + channel model (lines 114–137) — Correct. Buffered channel sized to len(peers) so every goroutine can send without blocking; wg.Wait() before close(results) guarantees the range-drain sees all results. Race-free slice writes via the indexed channel. No goroutine leaks in any path (parent-ctx cancellation or per-peer timeout both return through pollOneSQSPeerForHTFIFO, which sends its result before returning).

Per-peer timeout (line 153) — context.WithTimeout(ctx, defaultSQSCapabilityPollTimeout) takes the shorter of the caller's deadline and now+3s, which is the right semantics.

Body cap (line 176) — Consistent with how s3.go:1523 does it (io.LimitReader); sqs_auth.go uses http.MaxBytesReader instead but the effect is the same in this direction (outbound client read vs. inbound server read). Fine as-is.

Error string on fmt.Sprintf vs. errors.Wrapf (lines 172–173, 184–185) — The status.Error field is a string, not a error, so wrapcheck doesn't apply here. The mixed usage (errors.Wrapf(...).Error() for transport errors, fmt.Sprintf for logic errors) is fine.

Minor issue 1 — Non-200 body not drained

// adapter/sqs_capability_poller.go:171-174
if resp.StatusCode != http.StatusOK {
    status.Error = fmt.Sprintf(...)
    return status  // body is closed but not drained
}

resp.Body.Close() without draining prevents the HTTP transport from reusing the TCP connection. In a control-plane path (one call per CreateQueue) this is acceptable, but if the gate ever gets called on a cluster of many nodes under load, this will force connection teardown on every non-200 peer response. Adding _, _ = io.Copy(io.Discard, resp.Body) before the early return is the idiomatic fix. Fix this →

Minor issue 2 — `buildSQSHealthURL` double-path edge case

// adapter/sqs_capability_poller.go:209-214
func buildSQSHealthURL(peer string) string {
    if strings.HasPrefix(peer, "http://") || strings.HasPrefix(peer, "https://") {
        return strings.TrimRight(peer, "/") + sqsHealthPath
    }
    return "http://" + peer + sqsHealthPath
}

A caller who passes "http://node:5050/sqs_health" (full URL already including the path) would receive "http://node:5050/sqs_health/sqs_health" — the path is doubled. The contract ("pass a base URL") is documented in the comment and the test cases, so this isn't a latent bug in any callers today. But since TestBuildSQSHealthURL is specifically the regression fixture for this function's edge cases, it's worth adding a "http://node:5050/sqs_health" case that documents the behavior (whether you decide to detect+reject it or just document that callers must not do it). Fix this →

Tests (`adapter/sqs_capability_poller_test.go`)

Coverage is excellent — every documented failure mode is pinned, concurrent polling is timed, URL construction has its own table, body-size cap is independently exercised.

Nit — `TestPollSQSHTFIFOCapability_TimeoutFailsClosed` tests parent-context cancellation, not the per-peer cap

// adapter/sqs_capability_poller_test.go:147-151
ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
defer cancel()
report := PollSQSHTFIFOCapability(ctx, nil, []string{addrHang})

The comment says "force the per-peer timeout path quickly", but what this test actually exercises is ctx expiring at 500ms (which is shorter than defaultSQSCapabilityPollTimeout = 3s). The context.WithTimeout(ctx, 3s) inside pollOneSQSPeerForHTFIFO inherits the shorter deadline from the parent, so the request times out via the parent context, not the per-peer cap. The behavior pinned is correct (fail-closed), but the per-peer defaultSQSCapabilityPollTimeout = 3s path is never independently exercised. A complementary test that passes context.Background() (no deadline) and a peer that hangs for 3.5s would close this gap and validate the per-peer cap independently. Not a bug — just a gap in the stated coverage.

Nit — `TestPollSQSHTFIFOCapability_OneMissingFailsClosed` peer ordering assumption

// adapter/sqs_capability_poller_test.go:73-80
require.True(t, report.Peers[0].HasHTFIFO)   // addrGood
require.False(t, report.Peers[1].HasHTFIFO)  // addrOld

The indexed-channel design (goroutine writes to results at its input index) makes peer-order deterministic in the output slice, so these assertions are correctly testing the indexed aggregation. Calling this out because it's a subtle property worth keeping — if pollOneSQSPeerForHTFIFO is ever inlined into the goroutine using a shared-map instead of the indexed channel, this test would start flapping. The test is a good sentinel for that regression.

Summary

Lens	Finding
Data loss	None — read-only poll, no FSM/Pebble path
Concurrency	Correct — indexed channel avoids slice-write races; per-peer timeout enforced; no goroutine leaks
Performance	Concurrent polling confirmed by test; control-plane only, no hot-path impact
Data consistency	Fail-closed contract is sound and consistent with §8.5; empty-peer vacuous-true is documented
Test coverage	Excellent; two minor gaps noted (per-peer cap not exercised independently; double-path URL edge case not pinned)

gemini-code-assist

Code Review

This pull request introduces a new SQS capability poller that concurrently checks cluster peers for the htfifo capability. The implementation includes structured reporting, per-peer timeouts, and response body size limits to ensure reliability. Feedback was provided regarding the silent ignoring of errors when closing the HTTP response body, suggesting that these should be logged to provide visibility into potential resource cleanup issues.

gemini-code-assist · 2026-04-30T11:09:38Z

+		status.Error = errors.Wrapf(err, "GET %q", url).Error()
+		return status
+	}
+	defer func() { _ = resp.Body.Close() }()


The error from resp.Body.Close() is being silently ignored. According to the general rules, errors from Close() methods on resources like network connections should be logged to ensure resource leaks or other cleanup problems are visible. If a logger is not available in this context, consider how to surface this failure or ensure it is handled according to the project's reliability standards.

References

Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

PR #721 round 1 review fixes: 1) Gemini medium — Body.Close error silently ignored defer func() { _ = resp.Body.Close() }() dropped any close error. A failed Close indicates a connection that the http.Transport will tear down rather than reuse — under load this can mask leaking connections / file descriptors. Logged via slog.Warn with the peer address so operators can grep the cluster log when triaging. 2) Claude minor — non-200 body not drained resp.Body.Close() without first draining the body prevents the http.Transport from reusing the underlying TCP connection. In a control-plane path (one CreateQueue call per gate check) this is acceptable, but if the gate ever fans out across many peers under load, the failed-peer branch would force connection teardown on every error response. Drain via io.Copy(io.Discard, io.LimitReader(resp.Body, sqsCapabilityMaxBodyBytes)) before the early return so the transport can reuse the connection. 3) Claude nit — per-peer cap not exercised independently TestPollSQSHTFIFOCapability_TimeoutFailsClosed used a 500ms parent ctx — the request actually timed out via the parent context, not the per-peer cap. The default 3s per-peer cap was never independently exercised by tests. Refactored signature to PollSQSHTFIFOCapability(ctx, peers, cfg PollerConfig). PollerConfig{HTTPClient, PerPeerTimeout} is the single options surface — zero values pick safe defaults. Renamed the existing test to TestPollSQSHTFIFOCapability_ParentContextDeadlineFailsClosed and added TestPollSQSHTFIFOCapability_PerPeerTimeoutFailsClosed which uses context.Background() and PerPeerTimeout=100ms to exercise the cap independently. PerPeerTimeout is also a sensible operator knob — different cluster latencies want different bounds. Caller-side: only *_test.go files use the function today; PR 5's CreateQueue gate will pick the appropriate timeout when it wires this up. 4) Claude minor — buildSQSHealthURL double-path edge case A caller passing a URL that already includes the health path (e.g. "http://node:5050/sqs_health") would receive a doubled path. Added an explicit test case to TestBuildSQSHealthURL documenting the behavior and the contract ("pass a base URL or host:port, never a full request URL"). A future refactor can intentionally change the contract; the test will catch it. 5) Audit per the lessons-learned discipline PollSQSHTFIFOCapability is exported but has no production callers yet — only the test file references it. grep confirmed the API change is safe.

bootjp · 2026-04-30T11:17:26Z

@claude review

Round 2 — addresses Round 1 review:

1. Gemini medium — Body.Close error silently ignored: now logs via slog.Warn with peer address. A failed Close means the http.Transport will tear down rather than reuse the connection — silently dropping it could mask leaking connections under load.

2. Claude minor — non-200 body not drained: io.Copy(io.Discard, ...) before the early return so the transport can reuse the connection. Drain is wrapped in the same io.LimitReader cap so a misbehaving peer can't drain memory through the discard path.

3. Claude nit — per-peer cap not exercised independently: refactored PollSQSHTFIFOCapability signature to (ctx, peers, cfg PollerConfig) where PollerConfig.PerPeerTimeout is a tunable. Tests now have:

TestPollSQSHTFIFOCapability_ParentContextDeadlineFailsClosed (renamed from TimeoutFailsClosed) — pins parent-ctx deadline path
TestPollSQSHTFIFOCapability_PerPeerTimeoutFailsClosed (new) — uses context.Background() + PerPeerTimeout=100ms to exercise the cap independently of any parent deadline

Bonus: PerPeerTimeout is also a useful operator knob — different cluster latencies want different bounds. Audit per lessons-learned: grep -rn PollSQSHTFIFOCapability confirmed only the test file references the function — API change is safe.

4. Claude minor — buildSQSHealthURL double-path edge case: added an explicit test case documenting the behavior (caller must not pass a URL that already includes /sqs_health).

go test -race ./adapter/... and golangci-lint clean.

claude · 2026-04-30T11:17:47Z

Claude finished @bootjp's task in 2m 8s —— View job

Code Review: Round 2 — feat(sqs): htfifo capability poller

Tasks:

Read changed files
Review Round 2 changes against Round 1 feedback
Check implementation correctness
Post final review

Verdict: LGTM. All four Round 1 items are correctly addressed. No new issues found.

Round 1 feedback checklist

1. Body.Close() error logged (Gemini medium) ✅

// adapter/sqs_capability_poller.go:204-209
defer func() {
    if cerr := resp.Body.Close(); cerr != nil {
        slog.Warn("sqs capability poller: response body close failed",
            "peer", peer, "err", cerr)
    }
}()

Correct. Uses "peer" and "err" keys consistent with sqs_catalog.go and redis_retry.go. The closure captures peer from the enclosing scope correctly. The defer fires after every return path, so all cleanup is centralized.

2. Non-200 body drained before early return (Claude minor) ✅

// adapter/sqs_capability_poller.go:211-215
if resp.StatusCode != http.StatusOK {
    _, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, sqsCapabilityMaxBodyBytes))
    status.Error = fmt.Sprintf("%s returned HTTP %d", url, resp.StatusCode)
    return status
}

Correct. The drain is bounded by the same 1 KiB LimitReader as the success path — a misbehaving peer can't exhaust memory through the discard. After the drain, the deferred Close() fires without needing to consume the remainder itself (on a normally-behaved peer, the body is already empty; on an oversized one, the transport tears down the connection). Idiomatic _, _ = for discard.

3. PollerConfig.PerPeerTimeout; per-peer cap exercised independently (Claude nit) ✅

The PollerConfig struct is clean — zero value is safe, all fields optional. The renamed TestPollSQSHTFIFOCapability_ParentContextDeadlineFailsClosed and new TestPollSQSHTFIFOCapability_PerPeerTimeoutFailsClosed now test two distinct code paths:

// Parent-ctx path (line 143): ctx expires at 500ms < 3s per-peer cap
ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)

// Per-peer-cap path (line 179-180): context.Background() + 100ms cap
report := PollSQSHTFIFOCapability(context.Background(), []string{addrHang},
    PollerConfig{PerPeerTimeout: 100 * time.Millisecond})

The < 2*time.Second bound for the 100ms cap test is generous enough to avoid CI flakiness but tight enough to catch a regression to unbounded behavior. The boundary condition perPeerTimeout <= 0 → defaultSQSCapabilityPollTimeout correctly handles the zero value.

4. buildSQSHealthURL double-path documented in test (Claude minor) ✅

// adapter/sqs_capability_poller_test.go:291-297
{
    "http://node.example:5050" + sqsHealthPath,
    "http://node.example:5050" + sqsHealthPath + sqsHealthPath,
},

The comment ("pin the behaviour so a future refactor can either keep it or change the contract intentionally") is the right framing — document and lock rather than silently accept. No caller today passes a full request URL to the poller, so this is a latent trap, not an active bug.

One micro-observation (not blocking)

The slog error key is "err" here (consistent with sqs_catalog.go, redis_retry.go) but redis_delta_compactor.go uses "error". There's a pre-existing inconsistency in the adapter package; this PR is consistent with the SQS-adjacent files, so no change needed here.

Summary

Round 1 item	Status
Body.Close() error logged	✅ Addressed
Non-200 body drained	✅ Addressed
Per-peer cap tested independently	✅ Addressed
Double-path URL edge case pinned	✅ Addressed

The concurrency model, fail-closed contract, body-size cap, and all test coverage from Round 1 remain intact. Ready to merge.

Closes the Phase 3.D PR 4-B-2 → PR 5 chain by adding the §8 leadership-refusal safeguard and flipping htfifoCapabilityAdvertised to true. This is the final piece of the routing+leadership-refusal pair the §11 PR 4 contract requires before a binary is "marked htfifo-eligible". What changes raftengine: leader-acquired observer (mirror of leader-loss) - raftengine.Admin gains RegisterLeaderAcquiredCallback. Same contract as RegisterLeaderLossCallback (non-blocking, panic- contained, sentinel-pointer deregister) but fires on the previous!=Leader → status==Leader edge instead of leaving the leader role. - etcd backend: new leaderAcquiredCbs slice + mutex; fires from refreshStatus on the leader-acquired edge AFTER e.isLeader is published, so a callback that calls engine.State() observes StateLeader. - register / fire helpers extracted (registerLeaderCallback, gatherLeaderCallbacks) so the leader-loss and leader-acquired paths share one slot management implementation. The dupl lint warning that triggered on first draft is the test that keeps this consolidated. - Test coverage: leader_acquired_callback_test.go mirrors leader_loss_callback_test.go — panic containment, empty-list safety, deregister removal, deregister idempotence, nil-fn safety, nil-receiver safety, identical-fn disambiguation. main: SQS leadership-refusal hook - main_sqs_leadership_refusal.go: installSQSLeadershipRefusal + installSQSLeadershipRefusalAcrossGroups + partitionedGroupSet. - On install, if the engine is currently leader of a group hosting a partitioned queue and the binary lacks htfifo, the hook calls TransferLeadership immediately. The leader-acquired observer then keeps catching future transitions for the same group. - TransferLeadership runs in a goroutine because the leader-acquired callback contract is non-blocking — a synchronous admin RPC inside the callback would stall refreshStatus. - sqsLeadershipController is the small interface the helper accepts (subset of raftengine.Admin) so test doubles don't have to satisfy the full Admin surface. - run() wires installSQSLeadershipRefusalAcrossGroups after the coordinator is built; the composite deregister flows through cleanup. adapter: AdvertisesHTFIFO + flag flip - adapter.AdvertisesHTFIFO() reports the htfifo capability flag so main.go can read it without touching the package-private constant. - htfifoCapabilityAdvertised = false → true. Both the routing wiring (PR 4-B-2 #715) and the leadership-refusal hook (this PR) are now in the binary, so the design's "marked htfifo-eligible" bar is met. What's still gated PR 5 lifts the PartitionCount > 1 dormancy gate AND wires PollSQSHTFIFOCapability (PR 4-B-3a #721) into the CreateQueue gate in the same commit. Until PR 5 lands, no partitioned queue can land in production — the leadership-refusal hook is dormant in the happy-path runtime (every binary past this PR advertises htfifo, and the per-group early return keeps the hook out of the hot path). Self-review (per CLAUDE.md) 1. Data loss — control-plane only; no FSM/Pebble/retention path. The hook calls TransferLeadership which is itself an admin action with the same data-loss profile as a graceful manual transfer. No issue. 2. Concurrency / distributed failures — leader-acquired callback contract mirrors leader-loss (non-blocking, panic-contained, sentinel-pointer deregister). refuse() offloads the actual TransferLeadership to a goroutine so refreshStatus stays non-blocking. Multiple goroutines calling refuse() for the same group queue serially in raft's admin channel; the worst case is one redundant transfer attempt, which is idempotent on the raft side. No issue. 3. Performance — leader-acquired callbacks fire only on the transition edge (rare event); no per-request hot path cost. The early return on advertisesHTFIFO=true means production-binary hosts pay zero overhead. No issue. 4. Data consistency — the hook protects against the §8 downgrade scenario: a node rolled back to a pre-htfifo binary that still gets elected leader of a partitioned-queue shard would otherwise read/write under the legacy keyspace and silently corrupt the queue. The hook steps it down via TransferLeadership before any client request lands. No issue. 5. Test coverage — 7 raftengine observer tests (mirror of leader-loss panic / empty / deregister / idempotence / nil guards / sentinel-pointer disambiguation) + 11 main-side helper tests (htfifo no-op, no-partitioned-queue no-op, startup-already-leader refuses, startup-follower waits, per-acquisition fires, deregister propagates, transfer error logged, nil-admin safe, partitionedGroupSet flatten / empty / malformed).

…723) ## Summary Phase 3.D PR 4-B-3b — closes the routing+leadership-refusal pair the §11 PR 4 contract requires before a binary is "marked htfifo-eligible". With this PR landed, every node's `/sqs_health` advertises `htfifo` and the §8 downgrade-protection safeguard is in place. PR 5 (next) lifts the `PartitionCount > 1` dormancy gate and wires `PollSQSHTFIFOCapability` (#721) into the CreateQueue gate in the same commit. ## What's added ### raftengine: leader-acquired observer (mirror of leader-loss) - `raftengine.Admin` gains `RegisterLeaderAcquiredCallback`. Same contract as `RegisterLeaderLossCallback` (non-blocking, panic-contained, sentinel-pointer deregister) but fires on the `previous!=Leader → status==Leader` edge in `refreshStatus`. - The etcd backend's slot-management is now shared between leader-loss and leader-acquired via `registerLeaderCallback` + `gatherLeaderCallbacks` helpers — this satisfies the `dupl` lint check and keeps both paths consistent in one place. - 7 observer tests mirror the existing leader-loss tests: panic-containment, empty-list safety, deregister removal, idempotence, nil-fn safe, nil-receiver safe, identical-fn disambiguation. ### main: SQS leadership-refusal hook - `main_sqs_leadership_refusal.go`: - `installSQSLeadershipRefusal(ctx, admin, gid, partitionedGroups, advertisesHTFIFO, logger) func()` — startup check + per-acquisition observer. - `installSQSLeadershipRefusalAcrossGroups(...)` — composite installer iterating every shard runtime. - `partitionedGroupSet(partitionMap, logger)` — flattens `--sqsFifoPartitionMap` into the `{gid → bool}` set the hook consumes. - `sqsAdvertisesHTFIFO()` — wraps `adapter.AdvertisesHTFIFO()`. - `run()` installs the composite refusal across runtimes after the coordinator is built; deregister flows through `cleanup`. - `TransferLeadership` runs in a goroutine because the leader-acquired callback contract is non-blocking — a synchronous admin RPC inside the callback would stall `refreshStatus`. - 11 helper tests cover the contract: htfifo no-op, no-partitioned-queue no-op, startup-already-leader refuses, startup-follower waits, per-acquisition fires, deregister propagates, transfer-error logged, nil-admin safe, `partitionedGroupSet` flatten / empty-input / malformed-ref-skip. ### adapter: `AdvertisesHTFIFO` + flag flip - `adapter.AdvertisesHTFIFO()` exposes the package-private flag. - `htfifoCapabilityAdvertised = false → true`. Both the routing wiring (PR 4-B-2 #715) and the leadership-refusal hook (this PR) are now in the binary, so the design's "marked htfifo-eligible" bar is met. ## What's still gated PR 5 lifts the `PartitionCount > 1` dormancy gate AND wires `PollSQSHTFIFOCapability` (#721) into the CreateQueue gate in the same commit. Until PR 5 lands, no partitioned queue can land in production — the leadership-refusal hook is dormant in the happy-path runtime (every binary past this PR advertises `htfifo`, and the per-group early return keeps the hook out of the hot path). ## Test plan - [x] `go test -race ./internal/raftengine/etcd/` — 7 new + existing tests pass. - [x] `go test -race ./kv/` — existing 30+ tests pass (verified the tests that previously failed due to the misplaced interface method now pass). - [x] `go test -race ./adapter/` + `go test -race .` — all pass. - [x] `golangci-lint ./kv/... ./adapter/... ./internal/raftengine/... .` — clean. ## Self-review (per CLAUDE.md) 1. **Data loss** — control-plane only; no FSM/Pebble/retention path. The hook calls TransferLeadership which is itself an admin action with the same data-loss profile as a graceful manual transfer. No issue. 2. **Concurrency / distributed failures** — leader-acquired callback contract mirrors leader-loss (non-blocking, panic-contained, sentinel-pointer deregister). `refuse()` offloads `TransferLeadership` to a goroutine so `refreshStatus` stays non-blocking. Multiple goroutines firing for the same group serialize on raft's admin channel; worst case is one redundant transfer attempt, which is idempotent on the raft side. No issue. 3. **Performance** — leader-acquired callbacks fire only on the transition edge (rare event); no per-request hot path cost. The early return on `advertisesHTFIFO=true` means production-binary hosts pay zero overhead. No issue. 4. **Data consistency** — the hook protects against the §8 downgrade scenario: a node rolled back to a pre-htfifo binary that still gets elected leader of a partitioned-queue shard would otherwise read/write under the legacy keyspace and silently corrupt the queue. The hook steps it down via TransferLeadership before any client request lands. No issue. 5. **Test coverage** — 7 raftengine observer tests + 11 main-side helper tests (18 new). Existing kv lease tests confirm the misplaced-interface bug from first draft was caught and fixed.  ## Summary by CodeRabbit * **New Features** * SQS health endpoint now advertises HT-FIFO capability status for client detection. * Raft administration introduces leader-acquired callbacks, enabling systems to respond immediately to leadership acquisition events. * SQS FIFO queue groups now automatically refuse and transfer leadership when HT-FIFO capability is unavailable. * **Tests** * Added comprehensive test coverage for leader-acquired callback registration, deregistration, panic isolation, and SQS leadership refusal logic.

…(Phase 3.D PR 5b-2) (#732) ## Summary Stage 2 of Phase 3.D PR 5b: wire `SendMessage` / `ReceiveMessage` / `DeleteMessage` / `ChangeMessageVisibility` through the per-key dispatch helpers landed in PR 5b-1 (#731). The §11 PR 2 dormancy gate still rejects `PartitionCount > 1` at `CreateQueue`, so production queues stay on the legacy keyspace and observable behaviour is byte-identical. **PR 5b-3 lifts the gate atomically with the capability check.** ## What's wired ### Send paths - `sendMessage`, `sendBatchStandardOnce`: partition is a compile-time `0` (Standard queues reject `PartitionCount > 1` via the cross-attribute validator). Dispatch helpers route to the legacy keyspace on the steady-state path. - `sendFifoMessage`: hashes `MessageGroupId` once via `partitionFor` and threads the result through every key construction (data, vis, byage, dedup, group-lock). ### Receive fanout - `scanAndDeliverOnce` wraps the scan loop with a per-partition fanout over `effectivePartitionCount(meta)` iterations. The shared wall-clock + per-call `Max` budget caps apply across the whole call, not per-partition. - `sqsMsgCandidate` carries a new `partition` field stamped at scan time, so `loadCandidateRecord` / `expireMessage` / `commitReceiveRotation` / `classifyFifoGroupLock` all route to the partition the message was originally stored under. ### Delete + ChangeMessageVisibility - `handle.Partition` from the v2 receipt handle drives every key construction. - `loadMessageForDelete` / `loadAndVerifyMessage` invoke the new `validateReceiptHandleVersion` once meta is in scope. ### Receipt-handle dispatch (the contract shift) - `encodeReceiptHandleDispatch(meta, partition, ...)` is the single point that picks v1 vs v2. - `decodeClientReceiptHandle` is now version-agnostic; the queue-aware version check moved to `validateReceiptHandleVersion`. v1-on-partitioned and v2-on-non-partitioned both surface as `ReceiptHandleIsInvalid`, **preserving the dormancy promise** (no v2 wire-format probability from the public API) under the new contract. Since the §11 PR 2 dormancy gate keeps every production queue non-partitioned in this PR, every v2 handle still surfaces as `ReceiptHandleIsInvalid` downstream — exactly the PR 5a observable behaviour. ### Reaper The reaper iterates legacy byAge keys only — partition-iterating enumeration ships in the later partition-reaper PR. `buildReapOps` / `reapOneRecord` pass `nil` meta + `partition 0` through dispatch helpers so the keys are byte-identical to the pre-PR-5b layout. ## What does NOT change yet - `CreateQueue` still rejects `PartitionCount > 1` with `InvalidAttributeValue` (the §11 PR 2 dormancy gate, lifted atomically in PR 5b-3). - Production queues never enter the partitioned dispatch branches in this PR — observable behaviour is byte-identical. - The reaper does not yet enumerate partitioned byAge keys (a later PR). ## Test plan 10 new tests across the contract surface. **Unit tests** (`adapter/sqs_keys_dispatch_v2_test.go`): - [x] `TestEncodeReceiptHandleDispatch_PicksVersionByPartitionCount` — pins the version dispatch decision across `nil` / `0` / `1` / `4` metas. - [x] `TestEncodeReceiptHandleDispatch_LegacyByteIdenticalToV1` — protects the byte-identical guarantee on legacy queues. - [x] `TestEncodeReceiptHandleDispatch_PerQueueUsesV2` — `perQueue + PartitionCount=4` still produces v2 handles (keyspace is partitioned even when `partitionFor` collapses every group to partition 0). - [x] `TestValidateReceiptHandleVersion_QueueAwareRules` — 10 sub-cases covering the (`meta.PartitionCount` × `handle.Version`) matrix, including `perQueue + PartitionCount=4`. - [x] `TestValidateReceiptHandleVersion_NilHandle` — defensive nil branch. - [x] `TestValidateReceiptHandleVersion_RejectsV2OnNonPartitioned` — named regression for the dormancy guarantee under the new contract. - [x] `TestSQSMsgVisScanBoundsDispatch_LegacyMatchesLegacy` — byte-identical to legacy `sqsMsgVisScanBounds` on legacy meta. - [x] `TestSQSMsgVisScanBoundsDispatch_PartitionedUsesPartitionedPrefix` — different partitions yield disjoint scan ranges. - [x] `TestSQSMsgVisScanBoundsDispatch_PerQueueOnPartitionedKeyspace` — pins the **PR #731 round 2 forward-note invariant**: `perQueue + PartitionCount=4` keeps the partitioned vis prefix at partition 0; collapsing to legacy would silently strand send writes. **Integration tests** (`adapter/sqs_partitioned_dispatch_test.go`) install a partitioned meta directly on a queue created via the public API, short-circuiting the dormancy gate without disabling it for production `CreateQueue`: - [x] `TestSQSServer_PartitionedFIFO_SendReceiveDeleteRoundTrip` — end-to-end smoke test: send 6 groups, receive surfaces all via fanout, every handle is v2, delete via v2 handle, queue is empty afterwards, legacy keyspace stays empty. - [x] `TestSQSServer_PartitionedFIFO_RejectsV1Handle` — forged v1 handle on a partitioned queue surfaces as `ReceiptHandleIsInvalid` via `DeleteMessage` and `ChangeMessageVisibility`. - [x] `TestSQSServer_PartitionedFIFO_PerQueueCollapsesToPartitionZero` — `perQueue + PartitionCount=4` receive surfaces every message in one fanout pass; every v2 handle records `Partition=0`. Updated `TestDecodeClientReceiptHandle_RejectsV2` → `TestDecodeClientReceiptHandle_AcceptsV2` to reflect the contract shift (rejection moved from API boundary to meta-aware `validateReceiptHandleVersion`). - [x] `go test -race ./adapter/...` (targeted SQS scope) clean. - [x] `golangci-lint run ./adapter/...` clean. ## Self-review (per CLAUDE.md) 1. **Data loss** — Dispatch helpers byte-identical on legacy queues (`PartitionCount<=1` routes to legacy constructors). DLQ FIFO computes `dlqPartition` via `partitionFor(dlqMeta, srcRec.MessageGroupId)` so cross-queue redrive lands in the right partition. Receive fanout scans every partition. No issue. 2. **Concurrency** — Receive fanout iterates partitions sequentially under one shared wall-clock + max budget, so the original per-call SLA is preserved. `meta` is loaded once and passed by pointer through helpers; no concurrent meta refresh in a single call. `validateReceiptHandleVersion` runs after the gen check so a `SetQueueAttributes` race cannot flip the answer (`PartitionCount` is immutable). `go test -race` clean. 3. **Performance** — Hot path adds 1 nil-check + 1 `PartitionCount` compare per dispatch. Receive fanout iterates `effectivePartitionCount(meta)` times: 1 on legacy / `perQueue`, N on `perMessageGroupId` with N partitions splitting roughly 1/N of the messages each. 4. **Data consistency** — Receipt handle's recorded partition matches the storage partition (`commitReceiveRotation` uses `cand.partition` for both `newVisKey` and the handle). Cross-version handles rejected as `ReceiptHandleIsInvalid` — no leak between keyspaces. `perQueue + PartitionCount=4` invariant pinned by both unit and integration tests. 5. **Test coverage** — 10 new tests across the contract surface. Existing legacy-path SQS tests pass unchanged. ## Caller audit (semantic-change discipline) `decodeClientReceiptHandle` semantics changed from "reject all v2" to "decode any version, defer version validation to meta-aware caller". All 3 production callers audited via grep: - `parseQueueAndReceipt` (`sqs_messages.go:1576`) → `deleteMessageWithRetry` → `loadMessageForDelete` → `validateReceiptHandleVersion` after meta load. ✅ - `DeleteMessageBatch` entry (`sqs_messages_batch.go:475`) → `deleteMessageWithRetry` → same as above. ✅ - `ChangeMessageVisibilityBatch` entry (`sqs_messages_batch.go:575`) → `changeVisibilityWithRetry` → `loadAndVerifyMessage` → `validateReceiptHandleVersion`. ✅ The dormancy guarantee (v2 → `ReceiptHandleIsInvalid` on non-partitioned) is preserved across every caller path. ## Follow-ups - PR 5b-3 (next): atomic gate-lift + `CreateQueue` capability check via `PollSQSHTFIFOCapability` (#721). Removes the `validatePartitionDormancyGate` call and verifies cluster-wide HT-FIFO support before accepting `PartitionCount > 1`. - PR 6: partition-iterating reaper (drops the "legacy byAge only" caveat in `buildReapOps`).  ## Summary by CodeRabbit * **New Features** * Partition-aware message routing for FIFO and standard queues. * Receipt-handle v2 introduced for partitioned queues; v1 retained for non-partitioned queues. * Per-queue receive fanout rotation to round-robin partitions. * **Bug Fixes** * Stronger queue-aware receipt-handle validation and bounds checking. * Partition-aware fixes across send/receive/delete/change-visibility/redrive/reaper flows. * Per-queue fanout state reset on queue create/delete. * **Tests** * Extensive partitioned dispatch, receipt-handle, visibility-scan, dedup, rotation, and end-to-end tests.

… round 2) Claude review on round 1 flagged that several doc comments still described the §11 PR 2 dormancy gate as if it were live, even though PR 5b-3 has lifted it and replaced it with the cluster-wide htfifo capability check. Documentation-only sweep — no behavior change. Updated: - adapter/sqs_catalog.go (sqsQueueMeta.PartitionCount field doc): was "PR 2 of the rollout introduces this field but a temporary CreateQueue gate rejects PartitionCount > 1 until PR 5 lifts the gate atomically with the data-plane fanout — so the schema exists but no partitioned data can land before the data plane is wired". Now describes validateHTFIFOCapability in tryCreateQueueOnce as the live guard. - adapter/sqs_catalog.go (parseAttributesIntoMeta comment): swapped the dormancy-gate reference for the capability gate. - adapter/sqs.go (sqsCapabilityHTFIFO and htfifoCapabilityAdvertised doc blocks): three stale references updated. The "PR 5 will use this for the CreateQueue capability gate" forward-reference is now "merged via #721 (PR 4-B-3a) and now consumed by validateHTFIFOCapability in the CreateQueue gate (PR 5b-3)". The block describing PR 5 lifting the dormancy gate is now factual past-tense. - adapter/sqs_keys.go (partitioned key family doc): swapped the "dormancy gate currently rejects" sentence for the validateHTFIFOCapability gate. - adapter/sqs_messages.go (decodeClientReceiptHandle doc): the "PR 5b-3 lifts the gate together with the capability check" was itself stale — PR 5b-3 has shipped. Updated to describe the current routing through validateReceiptHandleVersion. - adapter/sqs_partitioning_test.go (two test docstrings): "after PR 5 lifts the dormancy gate" / "the dormancy gate runs separately on CreateQueue and rejects this at the wire today" updated to refer to the capability gate. Verified by build + golangci-lint (0 issues) + targeted SQS test sweep (-race) all clean.

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

bootjp merged commit e9f33eb into main Apr 30, 2026
9 checks passed

bootjp deleted the feat/sqs-htfifo-capability-poller branch April 30, 2026 11:54

bootjp mentioned this pull request Apr 30, 2026

feat(sqs): leadership-refusal hook + flag flip (Phase 3.D PR 4-B-3b) #723

Merged

4 tasks

bootjp mentioned this pull request May 2, 2026

feat(sqs): wire partitioned-FIFO data plane through dispatch helpers (Phase 3.D PR 5b-2) #732

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a)#721

feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a)#721
bootjp merged 2 commits intomainfrom
feat/sqs-htfifo-capability-poller

bootjp commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented Apr 30, 2026

Summary

What's added

What's NOT added (deferred)

Test plan

Self-review (per CLAUDE.md)

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: feat(sqs): htfifo capability poller (Phase 3.D PR 4-B-3a)

Implementation (adapter/sqs_capability_poller.go)

Minor issue 1 — Non-200 body not drained

Minor issue 2 — buildSQSHealthURL double-path edge case

Tests (adapter/sqs_capability_poller_test.go)

Nit — TestPollSQSHTFIFOCapability_TimeoutFailsClosed tests parent-context cancellation, not the per-peer cap

Nit — TestPollSQSHTFIFOCapability_OneMissingFailsClosed peer ordering assumption

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Round 2 — feat(sqs): htfifo capability poller

Round 1 feedback checklist

1. Body.Close() error logged (Gemini medium) ✅

2. Non-200 body drained before early return (Claude minor) ✅

3. PollerConfig.PerPeerTimeout; per-peer cap exercised independently (Claude nit) ✅

4. buildSQSHealthURL double-path documented in test (Claude minor) ✅

One micro-observation (not blocking)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

claude Bot commented Apr 30, 2026 •

edited

Loading

Implementation (`adapter/sqs_capability_poller.go`)

Minor issue 2 — `buildSQSHealthURL` double-path edge case

Tests (`adapter/sqs_capability_poller_test.go`)

Nit — `TestPollSQSHTFIFOCapability_TimeoutFailsClosed` tests parent-context cancellation, not the per-peer cap

Nit — `TestPollSQSHTFIFOCapability_OneMissingFailsClosed` peer ordering assumption

claude Bot commented Apr 30, 2026 •

edited

Loading